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Psychological research shows that learning can be powerfully enhanced through testing, but this finding has so far been confined 
to memory tasks requiring verbal responses. We explored whether testing can enhance learning of visuospatial information in 
maps. Fifty subjects each studied 2 maps, one through conventional study, and the other through computer-prompted tests. For 
the tests, subjects were repeatedly presented with the same map with one feature deleted (e.g., a road or river), and tried to 
covertly recall the missing feature and its location. Subjects’ map drawings after 30 minutes were significantly better for maps 
learned through tests as compared to the same amount of time devoted to conventional study. These results suggest that the 
testing effect is not limited to the types of memory that require discrete, verbal responses, and that utilizing covert retrievals may 
allow the effect to be extended to a variety of complex nonverbal learning tasks. 



Many studies have shown that a memory test is 
useful not only for assessing memory, but also for 
improving memory. Research going back several 
decades has shown that tests can strengthen memory 
more than extra opportunities to restudy the material. 
This was first noted in studies looking at recall of word 
lists (e.g., Allen, Mahler, & Estes, 1969; Lachman & 
Laughery, 1968). It has also been found in foreign- 
language vocabulary learning. For example, when 
people were given 5 seconds to try and retrieve the 
English equivalent of an Eskimo word, and then both 
words were shown for an additional 5 seconds, 
subsequent memory was strengthened more than when 
the English and Eskimo words were both provided for 
the entire 10 second period (Carrier & Pashler, 1992). 

This benefit of testing over restudying — which will 
be referred to here as the testing effect, but is also 
sometimes referred to as retrieval practice — has been 
further demonstrated with general knowledge facts 
(McDaniel & Fisher, 1991), face-name pairs (Carpenter 
& DeLosh, 2005; Landauer & Bjork, 1978), text 
passages (Nungester & Duchastel, 1982; Roediger & 
Karpicke, 2006a), paired associate verbal items 
(Carpenter, Pashler, & Vul, in press; Cull, 2000), and 
free recall of word lists (Carpenter & DeLosh, 2006; 
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Kuo & Hirshman, 1996; Wheeler, Ewers, & Buonanno, 
2003). 

The potential utility of tests to improve students’ 
learning in educational contexts has sparked a lot of 
enthusiasm among psychologists (e.g., Bjork, 1988; 
Dempster, 1989, 1996; Glover, 1989; McDaniel & 
Einstein, 2005; Roediger & Karpicke, 2006b; Roediger 
& Marsh, 2005; Wheeler & Roediger, 1992). This 
potential was also noted by a National Research 
Council-sponsored review of the practical results of 5 
decades of learning and memory research (Druckman 
and Bjork, 1991, 1994). 

Limitations of the Testing Effect 

Efforts to harness the testing effect have so far been 
limited to memory tasks involving verbal responses. In 
both school and occupational contexts, however, people 
often learn information that is not well conveyed 
through words (e.g., Healy, King, & Sinclair, 1997; 
Wittman & Healy, 1995). Can testing also assist in 
learning visuospatially rich materials that do not require 
verbal responses? The current study explored this 
question with map learning. 

An immediate obstacle confronts efforts to 
implement testing with maps. Consider, for example, 
trying to learn landscape features such as rivers, 
highways, and buildings. While it is easy to collect and 
score brief verbal responses, it is not obvious how this 
could be done in a convenient way with map features. 
The method we used rests on 2 simple observations. 
First, there are some indications that the testing effect 
may not always require overt retrieval, based on two 
studies conducted in our lab using verbal materials 
(Carpenter, Pashler, & Vul, in press; Carpenter, Pashler, 
Wixted, & Vul, in preparation). Second, when a person 
is motivated to learn, they can score their own responses 
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to determine which items require further study. Self- 
scoring is, after all, the basis for individuals’ successful 
use of flashcards. 

This method we used incorporates the potential for 
tests to benefit memory in both direct and indirect ways. 
Studies of the testing effect with verbal memory have 
shown, for example, that the act of retrieval per se may 
have some direct benefit on memory retention (e.g.. 
Carpenter & DeLosh, 2006; Carrier & Pashler, 1992). 
Or, the act of retrieval may reveal which items have 
been sufficiently learned and which have not, thus 
resulting in a more efficient use of subsequent study 
time (e.g., Izawa, 1992). The degree to which these 
direct versus indirect benefits comprise the testing effect 
has not been clearly specified through past research, nor 
was it addressed in the current study. The main goal of 
the current study was to determine whether the testing 
effect could be obtained with a visuospatial task that 
does not require an overt verbal response. 

Subjects learned two maps — one through a test, and 
the other through an additional study opportunity. After 
30 minutes, they were asked to draw both maps as best 
they could. To motivate subjects to learn the maps in 
both conditions, we advised them that a $10 bonus 
would be given to anyone whose performance on the 
final test fell within the top 1/3 of all scores. 

Method 

Subjects and Materials 

Fifty-two undergraduates from UCSD participated 
for course credit. Each subject learned 2 maps 
containing 12 features such as roads, rivers, and 
buildings (see Figure I). One map was learned through 
testing with feedback (the Test/Study condition), and the 
other map was learned through additional studying (the 
Study condition). Subjects finished learning one of the 
maps through one method (e.g.. Study), before learning 
the other map through the other method (e.g., 
Test/Study). Four counterbalancing conditions were 
created: (I) Map A was presented for Study, followed 
by Map B for Test/Study, (2) Map A was presented for 
Test/Study, followed by Map B for Study, (3) Map B 
was presented for Study, followed by Map A for 
Test/Study, and (4) Map B was presented for 
Test/Study, followed by Map A for Study. Each subject 
was randomly assigned to I of these 4 counterbalancing 
conditions: I (n = 17), 2 (n = II), 3 (n = 13), or 4 (n = 
II). 

Design and Procedure 

Subjects first read instructions that explained the 
Test/Study and Study procedures. They were then given 
practice with the Test/Study procedure using a display 



of colored shapes. After the practice phase, subjects 
were told that they would learn 2 maps, one using the 
Test/Study procedure they had practiced, and the other 
using the Study procedure. Before seeing the maps, 
subjects were advised that they would receive a later, 
unspecified memory test over both maps, and that if 
they scored within the top 1/3 of all subjects on this test, 
they would receive a $10 bonus. 

During Test/Study, subjects were first given 20 
seconds to study the map with all 12 features. Then, 
they were advised that they were about to be tested over 
the features of that map. Subjects pressed the spacebar 
to begin the test, and were shown an incomplete display 
of the map in which 1 of the 12 features was missing. 
Subjects were told to figure out what was missing and 
form a mental image of the missing feature in its 
location. When they had formed the image, they pressed 
the spacebar to see the complete map again. Subjects 
scored their own responses as correct or incorrect by 
pressing a button to indicate whether they successfully 
remembered the missing feature and its location. As 
soon as they responded, a 1 -second blank screen 
appeared, and then the same map was shown again with 
a different feature missing. Once all 12 features were 
tested in random order, they were tested again in a new 
random order, and the Test/Study procedure continued 
in this fashion for 100 seconds. 

During Study, subjects were first given 20 seconds 
to study the map with all 12 features. Then, they were 
given a screen of instructions telling them that they 
would be given another opportunity to study the 
complete map again for 100 additional seconds, and this 
began when they pressed the spacebar. The total 
duration for the Test/Study and the Study was thus held 
constant at 120 seconds. 

After engaging in an unrelated visual attention task 
for 30 minutes, subjects were given a blank sheet of 
paper and instructed to draw both maps as best they 
could. One subject (from condition 2) drew only one of 
the maps, and another subject (from condition 4) drew 
the same map twice. Data from these 2 subjects were 
discarded, and all analyses were based on the 50 
remaining subjects. 

Scoring the Map Drawings 

The map drawings were scored by an experimenter 
who was blind to the conditions the maps were assigned 
to. One point was awarded for each correct feature 
placed in the correct location, and one point was 
deducted for each extra feature, or a feature that did not 
belong in the map at all. The location of features was 
scored according to both absolute and relative accuracy. 

Absolute Accuracy. A point was awarded if the 
feature was placed within the correct quadrant (NE, 
NW, SE, SW), without respect to other features 
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Figure 1. We created 2 maps, one of a town (upper panel), and one of a recreational area (lower panel). Each map contained 4 
roads, 2 rivers, and 6 additional features such as a school or golf course. The original maps contained colors such as hlue (for the 
lake, ocean, and rivers), green (for the trees and golf course), and red (for the school and first aid station). Each subject learned 
hoth of the maps, one through completing a test with feedback (Test/Study) that lasted 120 seconds, and the other through 
engaging in additional study time (Study) for 120 seconds. Each subject was randomly assigned to one of four different 
counterbalancing conditions to learn both of the maps. 
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included in the drawing. For example, in Map A, a point 
was given for the church if it was drawn in the NE 
quadrant. In Map B, a point was given for the telephone 
if it was drawn in the SE quadrant. 

According to liberal quadrant scoring (L-Q), roads 
and rivers were considered correct if they were drawn in 
any of their correct quadrants. For example, a point was 
given for the main north-south road in Map A if it was 
drawn in either the NW or SE quadrants. A point was 
given for the main north-south road in Map B if it was 
drawn in either the NW or SW quadrant. According to 
stringent quadrant scoring (S-Q), roads and rivers were 
considered correct only if they were located in all of 
their correct quadrants, and if roads were correctly 
connected. For example, a point was given for the main 
north-south road in Map A only if it was drawn through 
both NW and SE quadrants, and if it was connected to 
both east-west roads. A point was given for the main 
north-south road in Map B only if it was drawn through 
both NW and SW quadrants, and if it was connected to 
both east-west roads. Withholding points for failing to 
connect roads only applied if subjects included those 
roads in their drawings. 

Relative Accuracy. A point was awarded if the 
feature was placed within the correct relative position 
with respect to other features included in the drawing. 
For example, in Map A, a point was given for the 
houses if they were drawn south of the main east-west 
road and river, southwest of the church, and northeast of 
the airport. In Map B, a point was given for the 
telephone if it was drawn southwest of the first aid 
station, south of the lake, west of the lesser north-south 
road, and north of the southernmost east-west road. 

According to liberal relative position scoring (L- 
RP), roads and rivers were scored as correct if they were 
located in the correct relative position with respect to 
other features included in the drawing. For example, a 
point was given for the main east-west road in Map A if 
it was drawn north of the church, houses, or airport. A 
point was given for the main north-south road in Map B 
if it was drawn west of the restrooms or east of the golf 
course. According to stringent relative position scoring 
(S-RP), roads and rivers were scored as correct only if 
they were located in the correct relative position with 
respect to all other features included in the drawing, and 
if the roads were correctly connected. For example, a 
point was given for the main east-west road in Map A 
only if it was drawn from the north side of the airport to 
the north side of the church, and if it was connected to 
the main north-south road and the lesser east-west road. 
A point was given for the lesser north-south road in 
Map B only if it was drawn west of the trees but east of 
the telephone, first aid station and lake, and if it was 
connected to the southernmost east-west road. 
Withholding points for failing to connect roads only 



applied if subjects included those roads in their 
drawings. 

Results 

Performance during Test/Study 

Map B was easier to learn than Map A, based on 
subjects’ self scoring. This was true whether 
performance was based on the first 12 test trials — when 
each feature was tested for the first time — or on all trials 
during the Test/Study phase. For the first 12 trials, 
subjects scored an average of 82% on Map B 
{SD = 14%), and 73% on Map A {SD = 14%), 
f(48) = 2.11, p < .05. For all of the trials combined, 
subjects scored an average of 87% on Map B 
{SD = 12%), and 77% on Map A {SD = 14%), 
f(48) = 2.54, p < .02. Subjects completed a higher 
number of test trials for Map B (M = 22.15, SD = 8.32) 
than for Map A {M = 18.65, SD = 8.60), but this 
difference was not significant, f = 1 .46. 

Accuracy of the Final Map Drawings 

To estimate inter-rater agreement, the final map 
drawings from 10 subjects were randomly selected and 
scored by an independent rater who was instructed on 
the scoring system but was blind to the conditions the 
maps were assigned to. The correlation between the 
accuracy scores of the two raters was .91 for Test/Study 
and .96 for Study according to Q-L scoring, .81 for 
Test/Study and .76 for Study according to Q-S scoring, 
.81 for Test/Study and .96 for Study according to RP-L 
scoring, and .94 for Test/Study and .81 for Study 
according to RP-S scoring (all p’s < .02). The 
following analyses were based on the accuracy of the 
final map drawings according to one rater. 

The Test/Study condition was significantly more 
beneficial than the Study condition by all four scoring 
procedures. According to a 2 x 4 mixed Analysis of 
Variance with test condition (Test/Study vs. Study) as 
the within-subjects factor and counterbalancing 
condition as the between-subjects factor, there was a 
significant main effect of test condition according to 
Q-L scoring, F (1, 46) = 10.85, p < .005, MSB = .026, 
Q-S scoring, F (1, 46) = 7.14, p < .02, MSB = .023, 
RP-L scoring, F (1, 46) = 4.12, p < .05, MSB = .024, 
and RP-S scoring, F (1, 46) = 4.82, p < .05, MSB = .022. 

Counterbalancing condition did not affect accuracy 
for any of the scoring procedures (all T”s < 1), nor did it 
interact with test condition for Q-L scoring, F = 1.08, 
Q-S scoring, F = .78, RP-L scoring, F =2.10, or RP-S 
scoring, F = 1.18. Table 1 shows the means and 
standard errors for the Test/Study and Study conditions 
by all four scoring procedures. 
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Table I 

Mean proportion correct on final test as a function of scoring procedure, counterbalancing condition, and test 
condition. 



Scoring Procedure 




Counterbalancing Condition 






Test/Study 


1 


2 


3 


4 


Total 


L-Q 


.82 (.04) 


.68 (.06) 


.73 (.05) 


.75 (.06) 


.74 (.03) 


S-Q 


.69 (.04) 


.56 (.05) 


.58 (.05) 


.63 (.05) 


.61 (.02) 


L-RP 


.76 (.05) 


.61 (.06) 


.65 (.05) 


.72 (.06) 


.68 (.03) 


S-RP 


.68 (.04) 


.55 (.06) 


.58 (.05) 


.63 (.06) 


.61 (.03) 


Study 

L-Q 


.64 (.06) 


.66 (.07) 


.62 (.06) 


.62 (.07) 


.63 (.03) 


S-Q 


.54 (.05) 


.52 (.07) 


.53 (.06) 


.53 (.07) 


.53 (.03) 


L-RP 


.63 (.05) 


.62 (.07) 


.67 (.06) 


.56 (.07) 


.62 (.03) 


S-RP 


.56 (.05) 


.54 (.07) 


.57 (.06) 


.51 (.07) 


.54 (.03) 



Note. Standard errors are given in parentheses. The maps were scored according to a liberal quadrant (L-Q) procedure, a stringent 
quadrant (S-Q) procedure, a liberal relative position (L-RP) procedure, and a stringent relative position (S-RP) procedure. All 4 
scoring procedures yielded a significant benefit of Test/Study over Study. 



Discussion 

Testing produced a significant enhancement of map 
learning. To our knowledge, this is the first 
demonstration that the testing effect can be found with a 
visuospatial task that does not require the production of 
an overt verbal response. It is encouraging to know that 
the effect is not limited to memory tasks that require 
writing, typing, or speaking aloud a verbal response, as 
in many of the past studies on this topic (e.g., see 
Dempster, 1996). 

The map learning task differs from other tasks 
involving nonverbal components — for example, face- 
name learning — for which testing effects have been 
demonstrated in past studies (e.g., Carpenter & DeLosh, 
2005). Although a face contains properties that might be 
difficult to verbalize, face-name learning is still similar 
to paired associate verbal learning in that it requires a 
verbal response (i.e., the name). In the map learning 
task, although subjects might use verbal labels to code 
the presence of some geographic features (e.g., church, 
golf course, etc.), verbal descriptions are not likely to 
underlie coding of the spatial properties of those 
features. 

These results help to broaden the boundaries of the 
testing effect beyond just verbal memory tasks, and in 
doing so, encourage future theoretical work to explore 
the means by which tests are beneficial for non-verbal 
learning as well as verbal learning. While some 
hypotheses considered for verbal materials might be 
pertinent to nonverbal materials as well (for reviews of 



hypotheses, see Carpenter & DeLosh, 2006; Carpenter 
et ak, in press; Carrier & Pashler, 1992), others would 
seem less applicable. Understanding these boundaries 
may also help in formulating and testing 
neurocomputational models of the phenomenon (e.g., 
see Mozer, Howe, & Pashler, 2004). 

The present study appears to be the first 
demonstration of the testing effect in which subjects — 
in both the test and study conditions — were offered a 
monetary incentive to learn the material as best they 
could. In past studies comparing testing with additional 
studying, subjects may not always have been motivated 
to learn material in the study condition, and thus these 
studies may have underestimated what would be learned 
from additional study in real-world contexts in which 
learners are motivated. By offering a cash bonus for 
learning the maps in both conditions, the present study 
provides a stronger basis for concluding that testing is 
likely to offer a true learning advantage over and above 
the gains attributable to incentives (which may often 
already be present in practical learning contexts). It may 
be a good idea to incorporate similar incentives into 
future testing effect studies in all domains. 

Optimizing the Effect 

It may be possible to refine these methods to 
produce even larger advantages of testing. Based on the 
number of test trials completed, subjects were tested 
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less than twice over each feature, on average. It seems 
conceivable that whereas the benefits offered by 
additional study may soon reach diminishing returns as 
more study time is provided, testing would show an 
even greater advantage if time is increased so that each 
feature can be tested more often. 

It is also conceivable that the testing advantage 
could be amplified further by allowing subjects to have 
additional tests over just those features they have not yet 
mastered, as in the drop-out (or “learning to criterion”) 
method used in studies of verbal learning (e.g., 
Atkinson, 1972). Such a method would capitalize on the 
indirect benefits of testing by optimizing subsequent 
study time. 

The testing advantage may also be greater when 
retention is tested after longer intervals (as has been 
reported for verbal materials; cf. Roediger & Karpicke, 
2006a; Runquist, 1983; Wheeler et ak, 2003). 
Manipulations that prevent subjects from basing their 
covert retrievals on working memory — as they were 
probably able to do, to some degree, in the present 
experiment — may further enhance the effect. For 
example, Whitten and Bjork (1977) found that an 
intervening test was more beneficial to retention if it 
was administered several seconds after the material was 
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